Cloud services are everywhere and they are very popular. Especially interesting are cloud drives for files backup and synchronization.
Files or harddrive backups are relatively easy. You "just" send your files over the network to a remote storage.
I have several customizations. First, I had actually patched Tigase itself:
I've reverted to the official tigase-server.jar and that was enough to fix clustering. So this patch must break cluster setup somehow.
There is also a custom VHostRepository class and the zrouter component. These don't seem to impact clustering ability, but unfortunately zrouter is unable to send messages to clients across nodes. Messages between clients across nodes works fine.
Here is the complete config:config-type=--gen-config-def --virt-hosts=api.example.com:comps=zrouter,pubsub.example.com:comps=zrouter,anon.example.com:+anon --tls-jdk-nss-bug-workaround-active=true --cluster-mode=true --cluster-nodes=s12:password,s13:password --debug=server --debug-packages=fanout.tigase.ZmqRouter,fanout.tigase.VHostFileRepository --vhost-anonymous-enabled=false --vhost-register-enabled=false --vhost-repo-class=fanout.tigase.VHostFileRepository --vhost-file=/opt/tigase/etc/domains --vhost-file-interval=30 --vhost-file-comp=zrouter message-router/components/msg-receivers/ext-comp.active[B]=false --comp-name-1=zrouter --comp-class-1=fanout.tigase.ZmqRouter zrouter/num-threads[I]=10 zrouter/in-spec=tcp://127.0.0.1:9200 zrouter/out-spec=tcp://127.0.0.1:9201 --tigase.xmpp.elements_number_limit=50000
VHostFileRepository reads a text file of domains on an interval, and adds/removes domains within tigase as necessary. The --vhost-file-comp part is significant. It means that every domain in this list is handled by zrouter.
The intent with this config is that the api domain, pubsub domain, and all domains discovered by VHostFileRepository should be handled by zrouter. The earlier route_fix patch makes it so Tigase doesn't hijack disco packets. This way zrouter can respond to disco packets on its own.
So I think the first issue to figure out is why zrouter can't send messages across the cluster. Code for the component is here:
For example, if a client connects to s13, and the zrouter of s12 tries to send a stanza with from="pubsub.example.com", then the tigase on s12 returns recipient unavailable rather than routing to s13.
You've seemed to strip repository configuration, and this is the part that breaks the clustering. Have you made changes in that area?
Alright it seems there was some problem elsewhere in my config. For testing purposes, I've reduced to a minimal config that works:config-type=--gen-config-def --virt-hosts=anon.example.com:+anon --tls-jdk-nss-bug-workaround-active=true --cluster-mode=true --cluster-nodes=s12:password,s13:password --debug=server --vhost-anonymous-enabled=false --vhost-register-enabled=false message-router/components/msg-receivers/ext-comp.active[B]=false
Now each Tigase listens on port 5277 and clustered routing works. I can connect a client to each node (anonymous login) and send between them.
Next to figure out why the additional config breaks clustering.
Care to share your init.properties file? Are there any other exceptions?
Possibly relevant:2014-12-22 18:53:54.328 [main] ClusterConnectionManager.getDefaults() SEVERE: Can not instantiate items repository for class: null java.lang.NullPointerException at java.util.regex.Matcher.getTextLength(Matcher.java:1234) at java.util.regex.Matcher.reset(Matcher.java:308) at java.util.regex.Matcher.<init>(Matcher.java:228) at java.util.regex.Pattern.matcher(Pattern.java:1088) at java.util.regex.Pattern.matches(Pattern.java:1129) at tigase.db.RepositoryFactory.getRepoClass(RepositoryFactory.java:492) at tigase.cluster.ClusterConnectionManager.getDefaults(ClusterConnectionManager.java:620) at tigase.conf.ConfiguratorAbstract.setup(ConfiguratorAbstract.java:553) at tigase.conf.ConfiguratorAbstract.componentAdded(ConfiguratorAbstract.java:183) at tigase.conf.Configurator.componentAdded(Configurator.java:50) at tigase.conf.Configurator.componentAdded(Configurator.java:33) at tigase.server.AbstractComponentRegistrator.addComponent(AbstractComponentRegistrator.java:116) at tigase.server.MessageRouter.addComponent(MessageRouter.java:108) at tigase.server.MessageRouter.addRouter(MessageRouter.java:145) at tigase.server.MessageRouter.setProperties(MessageRouter.java:807) at tigase.conf.ConfiguratorAbstract.setup(ConfiguratorAbstract.java:580) at tigase.conf.ConfiguratorAbstract.componentAdded(ConfiguratorAbstract.java:183) at tigase.conf.Configurator.componentAdded(Configurator.java:50) at tigase.conf.Configurator.componentAdded(Configurator.java:33) at tigase.server.AbstractComponentRegistrator.addComponent(AbstractComponentRegistrator.java:116) at tigase.server.MessageRouter.addRegistrator(MessageRouter.java:131) at tigase.server.MessageRouter.setConfig(MessageRouter.java:700) at tigase.server.XMPPServer.start(XMPPServer.java:142) at tigase.server.XMPPServer.main(XMPPServer.java:112)
Yep, the hostnames ('s12' and 's13') are resolvable to each other (they are in /etc/hosts of each node), and running 'hostname' on each node returns the correct name. Config looks like this:
I'm not sure if Tigase is listening on its cluster port. By using netstat, before and after launch, I can determine that Tigase is listening on ports 5222, 5223, 5269, 5280, and 5290. But these ports all have other meanings. Notably it is not listening on port 5277, which I believe is the default cluster port?
I have a nice component that, at a certain time, sends an XMPP message to an user. While I was testing on PSI with online users, it worked like a charm, the user got a message from email@example.com and everything looked fine.
If the user is offline, instead, I get a 404 error like the one below2014-12-22 17:06:22.521 [in_0-message-router] MessageRouter.processPacket() FINEST: Processing packet: from=null, to=null, DATA=<message xmlns="jabber:client" from="firstname.lastname@example.org/Smack" type="error" to="email@example.com"><body>my message</body><action xmlns="http://www.domain.com/extensions/message#action" type="my action" id="u1lkX-8"/><error code="404" type="wait"><recipient-unavailable xmlns="urn:ietf:params:xml:ns:xmpp-stanzas"/><text xml:lang="en" xmlns="urn:ietf:params:xml:ns:xmpp-stanzas">The recipient is no longer available.</text></error></message>, SIZE=453, XMLNS=jabber:client, PRIORITY=NORMAL, PERMISSION=NONE, TYPE=error
I suspect that I have to insert a particular recipient to deliver at packet-level, but I couldn't find any indication of that... it just seems strange that having from=null, to=null can be correct !
Can this be related to the non-delivered messages to offline recipients ?
Thanks in advance,
I followed your indication for the extension: that enhances the readability and uniforms the protocol used. Looks like the smartest solution.
For the external component, by now I'll go on for a while using an internal one, but I see your point and I think in future I may be tempted to use external components.
can't find a doc about Mobile v3 http://docs.tigase.org/tigase-server/snapshot/Development_Guide/html/#mobileoptimizations
only v1 and v2
could you please add
In case you specify nodes manually and enforce passwords then you would not need shared repository. Please make sure, that the cluster nodes hostnames match the FQDN hostnames of the machines, that they are resolvable and Tigase instances can and do establish connection.
I don't know what is the wrong. JID is also included in the certificate. But it seems the problem is with JID. Can u please tell me how to generate a certificate with JID if the problem with the certificate generation process.
It occurs to me that even if I solve this for users on our server, I would not be able to solve this for users coming from remote Tigase servers. Probably I need to make our pubsub service capable of handling these stanzas out of sequence.
I still consider this a bug that should be solved somehow though.
I'm using Tigase 5.2.3.
We are using pubsub temporary subscriptions, and when a client sends a presence stanza to our pubsub service and immediately follows it with an iq for subscribing, there is a chance that the pubsub service receives these stanzas out of order. Our pubsub service rejects subscriptions from clients whose presence it doesn't have, and so if the iq is received before presence then the subscription is rejected.
I've tried to fix this with --nonpriority-queue=true and --queue-implementation=tigase.util.NonpriorityQueue, but every once in awhile the stanzas still arrive out of order. I don't know if these options are broken or if there is some other way the stanzas could get reordered. To make sure this is not a problem in the pubsub component, I'm returning 1 for processingInThreads() during my tests.
I'm running Tigase without a shared database. Probably Derby is being used but really there is nothing in it. I use --vhost-repo-class to supply domains, and only anonymous user auth is supported. There are no rosters nor offline messages.
However, I'm having trouble with cluster mode. There are two servers, each have their own hostname, and I'm specifying both nodes (and passwords) with --cluster-nodes in the init.properties files on each server.
I've noticed that a message sent to one tigase instance but destined for a user connected on the other tigase instance will return a recipient-unavailable error response. Just wanted to check if this problem might result from having no shared database (in case it is used for node coordination).
Very interesting the part of offloading the hard work to a separate server, could you please direct me to a documentation page that deepen this question ?
That's a core of the Tigase API for components. If you write a component using Tigase API, you do not have to think or worry how it is deployed. It can be deployed as an internal Tigase component or as an external component. This is transparent from development point of view as this is just the server configuration option. It is described in the development guide describing components development and in an admin guide describing deployment of external components.
As for the "interfering" with AMP code. It is designed to be extensible. If you extend it by adding more plugins (actions and conditions) you are not interfering. You just add... more actions and conditions which would be triggered by a different AMP payload. So the existing logic would not be affected.
Actually everything looks correct. You assign to Java (Tigase) process 8GB of RAM and the top command shows that Java process allocated 7.8GB which is close to the max but still below. Java shows that it uses about 2 - 2.5GB of heap which also is correct. You have to be aware that usually Java process allocates all the memory it is allowed to allocate because it has own, internal memory management mechanisms. So even if it allocates 8GB of RAM it does not mean that the whole memory is used by the actual JVM application. In the case above, Tigase uses only 2 - 2.5GB of all allocated memory, it should also show about 5 - 6 GB of used heap which gives you plenty of room.
So the above comments are technical stuff related to memory management. Another, more or less separate issue is whether the Tigase uses too much memory. There is not simple answer to this question. It depends on many factors. One of them is an average contact list size (roster size). You say you have 60k user connections. If each user has approx 200 contacts in his roster then the server keeps in memory 12 million contacts and some metadata about these contacts. This requires memory of course. You also say, you have approx 30k messages a day. But if all of these 30k are sent at the same time, then there can be time when you gave 30k messages per second and then nothing for the rest of the day. Still 30k messages a day on average but at some point Tigase requires a lot of memory to process 30k messages at the same time. Another thing is, do you have any other traffic like presence traffic. For 60k users and 200 roster, assuming each user logins only once a day and logs out once a day, this can generate more than 50M presence packets a day. On average this is just 600 presences a second, but usually users login at more or less the same time and logout at more or less the same time, so there could be a peak traffic times.
There are also other factors which may affect memory usage, such as group chat, stream compression, etc....
From our LT we know that Tigase can handle up to 160k online users with average roster size of 150 on real HW machine with 16GB of RAM.
+ We are looking at RES column in top which is increasing beyond defined limit(Xmx).
ps ax | grep java
will give you a list of all Java processes running on the server. Find the one running Tigase and kill it with command:
kill -9 PID
PID is a process id of the Java process running Tigase.
Thanks for the response. I really appreciate it.
If the second instance is running, please let me know the way to destroy the second instance or fix this issue so that only one instance runs.
A quick response will be much appreciated.
This indicates that you already have running server and you are trying to run second instance (most likely on pid 3081).
We provide software products, consulting and custom development servicesTigase, Inc.
Follow us on: